Daniel
Seebacher, University of Konstanz, daniel.seebacher@uni.kn PRIMARY
Bruno Schneider, University of Konstanz bruno.schneider@uni.kn
Michael
Behrisch, Harvard University, behrisch@g.harvard.edu
Student Team: NO
KNIME,
Tableau,
Java +
Piccolo2d,
Javascript
(Angular, D3.js, …)
Approximately how many
hours were spent working on this submission in total?
Michael:
10 Days 5 Hours
Daniel:
~40-50 hours.
Bruno: ~40-50
hours.
May we post your submission
in the Visual Analytics Benchmark Repository after VAST Challenge 2017 is
complete? YES
Video
Due to sickness, only part 1 on this video is
uploaded.
Questions
MC2.1 – Characterize
the sensors’ performance and operation.
Are they all working properly at all times? Can you detect any unexpected behaviors of
the sensors through analyzing the readings they capture? Limit your response to no
more than 9 images and 1000 words.
To examine, if there are
errors or incorrect measurements in the sensor data, we created a matrix-based
visualization of each monitor for the readings of all four chemical. The
resulting overview is shown in Figure 1. Each rectangle in the matrix represents one timestamp and these
rectangles are chronologically layouted row-by-row. The color is mapped to the
value using min-max normalization and range from light gray (0) to dark blue
(max. measured value for a chemical). Magenta is used to indicate missing
values. There are individual missing values for Appluimonia and Chlorodinine,
which could be the result isolated sensor malfunctions. However, there are
hundreds of missing values for Methylosmolene, in all monitors, which cannot be
the result of isolated sensor malfunctions.
Figure 1: Matrix-based visualization of each monitor for each reading. Each
rectangle in these matrix visualizations represents one timestamp and are
chronologically layouted row-by-row.
Switching the color scale
from a value-based mapping shows really unexpected results. We use a new color
scale, counting the number of readings per timestamp. Usually we would expect,
that we have one reading for each chemical for each timestamp, e.g.
Monitor |
Timestamp |
Chemical |
Reading |
1 |
4/1/16 0:00 |
AGOC-3A |
0,722303 |
1 |
4/1/16 0:00 |
Appluimonia |
0,130435 |
1 |
4/1/16 0:00 |
Chlorodinine |
1,25917 |
1 |
4/1/16 0:00 |
Methylosmolene |
2,63064 |
“Monitor 1, 4/1/16 0:00,
AGOC-3A, 2.68382”. However, there are exactly 214 occurences, where we observe
multiple readings of one chemical for a single timestamp, e.g two readings for
AGOC-3A in the following example.
Monitor |
Timestamp |
Chemical |
Reading |
5 |
4/1/16 16:00 |
AGOC-3A |
0,682529 |
5 |
4/1/16 16:00 |
Appluimonia |
0,0176918 |
5 |
4/1/16 16:00 |
Chlorodinine |
0,263674 |
5 |
4/1/16 16:00 |
AGOC-3A |
6,36965 |
We changed the color
scale to show these patterns as shown in Figure 2. We can immediately see, that for each missing value of Methylosmolene,
we have an additional reading for the chemical AGOC-3A. Since we can observe
this behavior over all available monitors, we can most likely rule out
individual sensor malfunctions, and form our hypothesis, that these readings
for the dangerous chemical Methylosmolene were changed to the harmless AGOC-3A
chemical!
Figure 2: Using the number of readings per chemical per timestamp as a new color
scale, we immediately see that for each missing Methylosmolene value, we have
an additional reading of AGOC-3A.
Additionally, we found a
very interesting pattern in the readings of monitor 4. In Figure 3 we see, that the readings for the values Appluimonia and Chlorodine
increase drastically with each passing month. Indicating, that either the
sensor has a malfunction regarding these two chemicals, or that the output of
those chemicals is indeed increasing. However, since this pattern only occurs
for monitor 4, we assume that this is a sensor malfunction.
Figure 3: Interesting pattern in the readings for the chemical Appluimonia and
Chlorodine for sensor 4. We see that over the course of April, August and
December, that the values of the readings for
Appluimonia and Chlorodine increase drastically each month.
Another
pattern we observed concerns the wind direction and speed. In contrast to the
sensor readings, we don’t have hourly, but readings every three hours. We used
linear interpolation in order to fill this gap. However, there is a large gap
starting at 8/1/16 0:00 until 8/4/16 17:00 as shown by the magenta rows in Figure 4.
Figure 4: Large magenta rows indicate a large gap in the data starting from
8/1/16 0:00 until 8/4/16 17:00
MC2.2 – Now
turn your attention to the chemicals themselves. Which chemicals are being detected by the
sensor group? What patterns of chemical
releases do you see, as being reported in the data?
Limit your response to no
more than 6 images and 500 words.
We first preprocessed the data to include the anomalies measured for the
chemical Methylosmolene. To validate our hypothesis that some of the anomalies
were instances of retrievable patterns we build an overview over all anomalies.
For each anomaly, we used a glyph representation, which shows time series
matrices for the anomaly date plus/minus three days as shown in Figure 6.
Figure 6: Closeup of one MatrixFlower Glyph showing four distinct time series
(TS) matrices for one monitor and one specific anomaly date. TS matrices
represent in the cells every possible interval of the time series. The x and y
axis (always the adjacent leg and the opposite leg [math]) depict all possible
start, respectively end dates. The cells on the diagonal show the actual values
of the TS. The intervals are represented by their means. The four TS matrices
are arranged in increasing 90 degree angles. Here we see for the top left TS
matrix that the Wind direction was in the beginning unstable (bottom left of
the TS matrix), then gradually changed only slightly over the course of the 3
days. The bottom left TS matrix shows the Methylosmolene. Here an outstanding
rectangle represents a single TS burst shortly before the anomaly date.
Figure 7 Overview over all 214 anomalies
with a “Matrix Flower Glyph” representing the four values WindDirection (Top
Left), WindSpeed (Top Right), Methylosmolene (Bottom Left) and ACOG-3A (Bottom
Right). One can see several patterns of similar behavior in the view.
Figure 8: Similar Wind Patterns and overall
low values for the main chemicals except for one burst in the readings time
series.
Figure 9: Unspecific Wind Patterns lead
often to homogenous reading time series.
Figure 10: Specific strong bursts in
Methylosmolene are often “interrupted” and followed by very low/normal
readings. Outstanding light rectangle in the lower right time series matrix.
Since these patterns appear to be quite characteristic we are
experimenting with an automatic retrieval of similar TimeSeries Matrices. For
this purpose, we are calculating a feature descriptor (JCD [JCDDescriptor]) for
each of the TimeSeries Matrix. This compositedescriptor combines two
“subfeature” descriptors: CEDD and FCTH and combines color and texture
information. We are using the Tanimoto Coefficient for our similarity
calculation.
Figure 11: Image-based similarity calculation
for retrieving similar TimeSeries Matrices. A connection line depicts the
similarity value: A (light) grey value and alpha shows dissimilar items and an
outstanding red, opaque shows a very similar TSMatrix.
In order to approach the question which value for the ACOG-3A is the
correct one we use an anomaly component to show
all alternatives (either first or second value is correct)
Figure 12: Anomaly Detail View. In a
comparative view we can examine the time series behavior for either only the
first (AGOC-3A1) of the reading values for AGOC-3A or the second (AGOC-3A2).
Here we see that likely AGOC-3A1 was used to defer the sensor values for
Methylosmolene.
MC2.3 – Which factories are
responsible for which chemical releases? Carefully describe how you determined
this using all the data you have available. For the factories you identified,
describe any observed patterns of operation revealed in the data.
Limit your response to no
more than 8 images and 1000 words.
To find out which factories are
responsible for which chemical releases, we follow the information-seeking
mantra. We start with an overview visualization of all chemical readings, for
all monitors, for each timestamp as shown in Figure 13. In this visualization, we can see different
eye-catching patterns of chemical releases, which we will investigate to find
the probable polluters. Use-Case 1 (red), Use-Case 2 (green), Use-Case 3
(purple), Use-Case 4 (blue) and Use-Case 5 (yellow). For these use-cases we
show how by extending our application to incorporate the wind direction and speed,
we can identify the most probable causers of the pollution.
Figure 13: Overivew visualization showing all chemical readings of all monitors
for each timestamp. Highlighted are different use-cases, which we will investigate further.
Use-Case 1 (red), Use-Case 2 (purple), Use-Case 3 (blue), Use-Case 4 (green)
and Use-Case 5 (yellow).
Use-Case 1 (red):
Here we take a closer look at the readings for
the chemical Chlorodinine of monitor 6, which exhibits the most peaks in the
readings. By zooming in and extending the visualization to show the wind
direction and speed, we get the resulting visualization as shown in Figure 14. The arrow direction indicates
where the wind is originating from and the arrow length indicates the wind
speed. We can see that at times where there is very high Chlorodinine reading,
that the wind is always originating from west-south-west. By placing these
arrows on the map, we can immediately see the source of pollution as shown in Figure 15. Our investigation shows, that the
most probable causer of the pollution of the chemical Chlorodinine is Kasios.
Figure 14: High-detail view of the readings for the chemical Chlorodinine of
Monitor 6. Arrow direction indicates the origin of wind and arrow length
indicates wind speed.
Figure 15: Source of pollution of the chemical Chlorodinine measured at monitor
6. We can see that Kasios is the most probable pollutant.
Use-Case 2 (green):
We see a stark increase in the pollutant
Appluimonia and Chlorodinine over each for monitor 4 and a very irregular
pattern of measurements for monitor 3 in Figure 16. However, a closer look at the data shows that there
is no consistent pattern in high-readings and wind direction. This indicates
that the chemical readings are either faulty, or that the pollution originates
from somewhere else.
Use-Case 3 (purple):
This might be the most interesting
reading, since how we showed in MC2.2, that there is a consistent manipulation
of the Methylosmolene reading to show up as AGOC-3A readings. Since
Methylosmolene is a very dangerous chemical, and AGOC-3A is considered
harmless, this makes sense. However, if we compare the wind direction at the
times where we have no values at monitor 6, we can see that for each missing
value, the wind is originating from the east, showing that either Kasios or
Roadrunner is responsible for the pollution of Methylosmolene, but we can’t
determine which or if one of them changed the Methylosmolene reading to show up
as AGOC-3A readings.
Use-Case 4 (blue):
The
readings for the chemical Appluimonia from monitor 9 show, that the high
readings we observed, if the wind originates from north, i.e. from inside the
nature preserve. This indicates, that the source is not one of the companies,
since no company is located north of monitor 9.
Use-Case 5 (yellow):
For the chemical AGOC-3A we see that if we
have high readings of the chemical AGOC-3A that these originate from east.
Again indicating that the companies Roadrunner or Kasios are the cause of this
pollutant.
Conclusion:
During our examination of the we found
out, that there is a lot of circumstantial evidence, that indicates that Kasios
is the main polluter for the chemical Chlorodinine. Additionally, we found out
that Kasios and Roadrunner are most likely polluters of the chemical
Methylosmolene and AGOC-3A. We see a very interesting pattern for the chemicals
Chlorodinine and Appluimonia for monitors 3 and 4. However, there we can’t
determine a source of pollution since there is no clear pattern in the wind
direction. Finally, we saw that there is clear pollution of the chemical
Appluimonia measured at monitor 9, but the wind is originating from north, i.e.
inside the nature preserve. This indicates that there is an additional source
of pollution, which is not one of the companies.
REFERENCES:
[JCDDescriptor] Zagoris, Konstantinos, et al.
"Automatic image annotation and retrieval using the joint composite
descriptor." Informatics (PCI), 2010
14th Panhellenic Conference on. IEEE, 2010.